Integrate Citation detector with Metrics #126

matt-bernhardt · 2024-10-17T21:53:41Z

This integrates the citation detector with the monthly metrics setup. It includes a tweak to one of the term fixtures, because the ISBN fixture had ended up getting counted in the citation detector (which is a real phenomenon, but our test fixtures feel like they should be crafted to keep things separate).

Quick note about the side effect

The fact that this needed to tweak a different term fixture is an indication that our detectors are starting to overlap. This is a real phenomenon in production, and is expected as we start to build more detectors.

I do not have a notion for now about how to address this in the more constrained test environment, where we can carefully craft fixtures to do exactly what we intend. It feels like this behavior should be confirmed via explicit tests that address this overlap, but the tests we have so far are focused on one-and-only-one detector, so I feel okay about tweaking things to keep them separate.

We do have a term fixture that trips multiple detectors, confirming that they get counted in all applicable categories, so we may be covered already?

Developer

Ticket(s)

https://mitlibraries.atlassian.net/browse/TCO-107

Accessibility

ANDI or Wave has been run in accordance to our guide and
all issues introduced by these changes have been resolved or opened
as new issues (link to those issues in the Pull Request details above)
There are no accessibility implications to this change

Documentation

Project documentation has been updated, and yard output previewed
No documentation changes are needed

ENV

All new ENV is documented in README.
All new ENV has been added to Heroku Pipeline, Staging and Prod.
ENV has not changed.

Stakeholders

Stakeholder approval has been confirmed
Stakeholder approval is not needed

Dependencies and migrations

NO dependencies are updated

YES migrations are included

Reviewer

Code

I have confirmed that the code works as intended.
Any CodeClimate issues have been fixed or confirmed as
added technical debt.

Documentation

The commit message is clear and follows our guidelines
(not just this pull request message).
The documentation has been updated or is unnecessary.
New dependencies are appropriate or there were no changes.

Testing

There are appropriate tests covering any new functionality.
No additional test coverage is required.

** Why are these changes being introduced: Now that the Detection::Citation model is in place, we need to integrate it with the existing Metrics reporting. ** Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/tco-107 ** How does this address that need: * Adds a citation field (integer) to the metrics_algorithms table * Defines a match_citations method in the Algorithms model, to pass search events through the Citation detector and count those where one is found (via the detection? method). This also includes expanding the event_matches logic to account for this in the unmatched calculations. * Creates a new search_event fixture, using the citation term fixture, to give the algorithm something to count. * As a side effect, we have to tweak the isbn term fixture, because it was unintentionally tripping the citation as well. See the side effect section for a longer discussion of this. * Adds tests to the Algorithm model to account for everything above, copying from other calculations. ** Document any side effects to this change: The fact that this needed to tweak a different term fixture is an indication that our detectors are starting to overlap. This is a real phenomenon in production, and is expected as we start to build more detectors. I do not have a notion for now about how to address this in the more constrained test environment, where we can carefully craft fixtures to do exactly what we intend. It feels like this behavior should be confirmed via explicit tests that address this overlap, but the tests we have so far are focused on one-and-only-one detector, so I feel okay about tweaking things to keep them separate. We do have a term fixture that trips multiple detectors, confirming that they get counted in all applicable categories, so we may be covered already?

JPrevost · 2024-10-18T14:14:17Z

app/models/metrics/algorithms.rb

      journal_exact = process_journals(event, matches)
      suggested_resource_exact = process_suggested_resources(event, matches)
      lcshs = match_lcsh(event, matches)

-      matches[:unmatched] += 1 if ids.detections.blank? && lcshs.detections.blank? && journal_exact.count.zero? && suggested_resource_exact.count.zero?
+      matches[:unmatched] += 1 if ids.detections.blank? && !citation.detection? && lcshs.detections.blank? && journal_exact.count.zero? && suggested_resource_exact.count.zero?


I'd like to normalize Citation a bit more after seeing how it behaves differently. Not in this PR, but just eventually. This could take the form of adding the .detection? to all of our Detectors and/or normalizing what .detection returns. No changes are should be made as part of this work, but this difference made me pause and have to understand why it was done this way. It makes sense based on how Citation currently works but feels like it may be confusing in the future as to why it is different.

Yeah, I agree that having a more standardized set of methods across our detectors would be helpful. Some differences might be unavoidable (the citation detector doesn't extract specific patterns, while the standard identifier one does), but looking now I think the others (which all treat the phrase holistically) could benefit from a .detection? method.

matt-bernhardt marked this pull request as ready for review October 17, 2024 21:53

mitlib temporarily deployed to tacos-api-pipeline-pr-126 October 17, 2024 21:55 Inactive

matt-bernhardt requested review from JPrevost and jazairi October 18, 2024 13:58

JPrevost self-assigned this Oct 18, 2024

JPrevost approved these changes Oct 18, 2024

View reviewed changes

matt-bernhardt merged commit 53b4e6a into main Oct 18, 2024
3 of 5 checks passed

matt-bernhardt deleted the tco-97-metrics branch October 18, 2024 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Citation detector with Metrics #126

Integrate Citation detector with Metrics #126

matt-bernhardt commented Oct 17, 2024 •

edited by JPrevost

Loading

JPrevost Oct 18, 2024

matt-bernhardt Oct 18, 2024

Integrate Citation detector with Metrics #126

Integrate Citation detector with Metrics #126

Conversation

matt-bernhardt commented Oct 17, 2024 • edited by JPrevost Loading

Quick note about the side effect

Developer

Ticket(s)

Accessibility

Documentation

ENV

Stakeholders

Dependencies and migrations

Reviewer

Code

Documentation

Testing

JPrevost Oct 18, 2024

Choose a reason for hiding this comment

matt-bernhardt Oct 18, 2024

Choose a reason for hiding this comment

matt-bernhardt commented Oct 17, 2024 •

edited by JPrevost

Loading